DiscoverArtificiality: Being with AIDeepSeek: What Happened, What Matters, 
and Why It’s Interesting
DeepSeek: What Happened, What Matters, 
and Why It’s Interesting

DeepSeek: What Happened, What Matters, 
and Why It’s Interesting

Update: 2025-01-28
Share

Description

First:


- Apologies for the audio! We had a production error…




What’s new:


- DeepSeek has created breakthroughs in both: How AI systems are trained (making it much more affordable) and how they run in real-world use (making them faster and more efficient)




Details


- FP8 Training: Working With Less Precise Numbers


- Traditional AI training requires extremely precise numbers


- DeepSeek found you can use less precise numbers (like rounding $10.857643 to $10.86)


- Cut memory and computation needs significantly with minimal impact


- Like teaching someone math using rounded numbers instead of carrying every decimal place


- Learning from Other AIs (Distillation)


- Traditional approach: AI learns everything from scratch by studying massive amounts of data


- DeepSeek's approach: Use existing AI models as teachers


- Like having experienced programmers mentor new developers:


- Trial & Error Learning (for their R1 model)


- Started with some basic "tutoring" from advanced models


- Then let it practice solving problems on its own


- When it found good solutions, these were fed back into training


- Led to "Aha moments" where R1 discovered better ways to solve problems


- Finally, polished its ability to explain its thinking clearly to humans


- Smart Team Management (Mixture of Experts)


- Instead of one massive system that does everything, built a team of specialists


- Like running a software company with:


- 256 specialists who focus on different areas


- 1 generalist who helps with everything


- Smart project manager who assigns work efficiently


- For each task, only need 8 specialists plus the generalist


- More efficient than having everyone work on everything


- Efficient Memory Management (Multi-head Latent Attention)


- Traditional AI is like keeping complete transcripts of every conversation


- DeepSeek's approach is like taking smart meeting minutes


- Captures key information in compressed format


- Similar to how JPEG compresses images


- Looking Ahead (Multi-Token Prediction)


- Traditional AI reads one word at a time


- DeepSeek looks ahead and predicts two words at once


- Like a skilled reader who can read ahead while maintaining comprehension




Why This Matters


- Cost Revolution: Training costs of $5.6M (vs hundreds of millions) suggests a future where AI development isn't limited to tech giants.


- Working Around Constraints: Shows how limitations can drive innovation—DeepSeek achieved state-of-the-art results without access to the most powerful chips (at least that’s the best conclusion at the moment).




What’s Interesting


- Efficiency vs Power: Challenges the assumption that advancing AI requires ever-increasing computing power - sometimes smarter engineering beats raw force.


- Self-Teaching AI: R1's ability to develop reasoning capabilities through pure reinforcement learning suggests AIs can discover problem-solving methods on their own.


- AI Teaching AI: The success of distillation shows how knowledge can be transferred between AI models, potentially leading to compounding improvements over time.


- IP for Free: If DeepSeek can be such a fast follower through distillation, what’s the advantage of OpenAI, Google, or another company to release a novel model?

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

DeepSeek: What Happened, What Matters, 
and Why It’s Interesting

DeepSeek: What Happened, What Matters, 
and Why It’s Interesting

Helen and Dave Edwards